Document Analysis at DFKI Part 2: Information Extraction

نویسندگان

  • Stephan Baumann
  • Michael Malburg
  • Hans-Günther Hein
  • Rainer Hoch
  • Thomas Kieninger
  • Norbert Kuhn
چکیده

Document analysis is responsible for an essential progress in office automation. This paper is part of an overview about the combined research efforts in document analysis at DFKI. Common to all document analysis projects is the global goal of providing a high level electronic representation of documents in terms of iconic, structural, textual, and semantic information. These symbolic document descriptions enable an “intelligent” access to a document database. Currently there are three ongoing document analysis projects at DFKI: INCA, OMEGA, and PASCAL2000/PASCAL+. Although the projects pursue different goals in different application domains, they all share the same problems which have to be resolved with similar techniques. For that reason the activities in these projects are bundled to avoid redundant work. At DFKI we have divided the problem of document analysis into two main tasks, text recognition and information extraction, which themselves are divided into a set of subtasks. In a series of three research reports the work of the document analysis and office automation department at DFKI is presented. The first report discusses the problem of text recognition, the second that of information extraction. In a third report we describe our concept for a specialized knowledge representation language for document analysis. The report in hand describes the activities dealing with the information extraction task. Information extraction covers the phases text analysis, message type identification and file integration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration of Document Representation, Processing and Management

Intelligenz, DFKI) with sites in Kaiserslautern und Saarbrücken is a non-profit organization which was founded in 1988 by the shareholder companies ADV The DFKI conducts application-oriented basic research in the field of artificial intelligence and other related subfields of computer science. The overall goal is to construct systems with technical knowledge and common sense which-by using AI m...

متن کامل

Speciication and Underspeciication in Lexical Semantic Processing for Information Extraction

Processing for Information Extraction Paul Buitelaar DFKI, Language Technology Stuhlsatzenhausweg 3, 66123 Saarbr ucken, Germany

متن کامل

DFKI-LT at QA@CLEF 2007

This Working note shortly presents QUANTICO, a cross-language open domain question answering system for German and English document collections. The main features of the system are: use of preemptive off-line document annotation with information like Named Entities, sentence boundaries and pronominal anaphora resolution; online extraction of abbreviation-extension pairs and appositional constru...

متن کامل

DFKI at QA@CLEF 2008

This Working Note shortly presents QUANTICO, a cross-language open domain question answering system for German and English document collections. The main features of the system are: use of preemptive off-line document annotation with information like Named Entities, sentence boundaries and pronominal anaphora resolution; online extraction of abbreviation-extension pairs and appositional constru...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995